**True Destination Locator**

Team Name : Sidharth Kumar Mohanty

1. Introduction

Objective

The objective of this project is to find the best neighbourhood or place in Toronto( A city in Canada) to open a start up or Italian restaurant using Foursquare location data. In this project we’ll go through the solution for this problem for avoiding or considering low risk criteria and high success rate.

Target Audiance

  • Business personnel who wants to invest or open a start up company or restaurant.
  • Bachelors who want to stay in a good city where they can get each facilities what they want like GYM,Playground,Parlour,Movie theatre etc.
  • The freelancer who loves to have their own small company or restaurant as a side business.
  • Marketing companies who want to release a new product on a best place.
  • Researchers who want to create a camp for Survey.
  • Torrists who wants to eat italian food.

Data Description

For this project we need these following data:

  1. Toronto City data that contains Borough, Neighborhoods along with there latitudes and longitudes
  • Data Source: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M
  • Description: This Wikipedia page contain all the information we need to explore and cluster the neighborhoods in Toronto. We will be required to scrape the Wikipedia page and wrangle the data, clean it, and then read it into a pandas dataframe so that it is in a structured format like the Toronto dataset.
  1. Geographical Location data using Geocoder Package
  • Data Source: https://cocl.us/Geospatial_data
  • Description: The second source of data provided us with the Geographical coordinates of the neighbourhoods with the respective Postal Codes.
  1. Venue Data using Foursquare API

Tech Stack Used

Machine Learning, Web Scraping, Foursquare API, Geocoder, Beautiful Soup, Folium

Table of Content

  1. Introduction
  2. Import Libraries
  3. Scrape Neighborhoods data
  4. Data Pre-processing
  5. Data Analysis
  6. Clustering
  7. Map Visualization
  8. Conclusion
  9. Future Work

2. Import Libraries

In [1]:
# install geopy to access geocoder package
!pip install geopy
Requirement already satisfied: geopy in c:\programdata\anaconda3\lib\site-packages (2.1.0)
WARNING: Ignoring invalid distribution -illow (c:\users\sidharth\appdata\roaming\python\python38\site-packages)
WARNING: Ignoring invalid distribution -equests (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -cikit-learn (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -cikit-image (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -illow (c:\users\sidharth\appdata\roaming\python\python38\site-packages)
WARNING: Ignoring invalid distribution -equests (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -cikit-learn (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -cikit-image (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -illow (c:\users\sidharth\appdata\roaming\python\python38\site-packages)
WARNING: Ignoring invalid distribution -equests (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -cikit-learn (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -cikit-image (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -illow (c:\users\sidharth\appdata\roaming\python\python38\site-packages)
WARNING: Ignoring invalid distribution -equests (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -cikit-learn (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -cikit-image (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -illow (c:\users\sidharth\appdata\roaming\python\python38\site-packages)
WARNING: Ignoring invalid distribution -illow (c:\users\sidharth\appdata\roaming\python\python38\site-packages)
WARNING: You are using pip version 21.1.1; however, version 21.1.3 is available.
You should consider upgrading via the 'C:\ProgramData\Anaconda3\python.exe -m pip install --upgrade pip' command.
Requirement already satisfied: geographiclib<2,>=1.49 in c:\programdata\anaconda3\lib\site-packages (from geopy) (1.50)
In [2]:
# install beautifulsoup4 for web scraping
!pip install beautifulsoup4
Requirement already satisfied: beautifulsoup4 in c:\programdata\anaconda3\lib\site-packages (4.9.1)
WARNING: Ignoring invalid distribution -illow (c:\users\sidharth\appdata\roaming\python\python38\site-packages)
WARNING: Ignoring invalid distribution -equests (c:\programdata\anaconda3\lib\site-packages)
Requirement already satisfied: soupsieve>1.2 in c:\programdata\anaconda3\lib\site-packages (from beautifulsoup4) (2.0.1)
WARNING: Ignoring invalid distribution -cikit-learn (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -cikit-image (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -illow (c:\users\sidharth\appdata\roaming\python\python38\site-packages)
WARNING: Ignoring invalid distribution -equests (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -cikit-learn (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -cikit-image (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -illow (c:\users\sidharth\appdata\roaming\python\python38\site-packages)
WARNING: Ignoring invalid distribution -equests (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -cikit-learn (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -cikit-image (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -illow (c:\users\sidharth\appdata\roaming\python\python38\site-packages)
WARNING: Ignoring invalid distribution -equests (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -cikit-learn (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -cikit-image (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -illow (c:\users\sidharth\appdata\roaming\python\python38\site-packages)
WARNING: Ignoring invalid distribution -illow (c:\users\sidharth\appdata\roaming\python\python38\site-packages)
WARNING: You are using pip version 21.1.1; however, version 21.1.3 is available.
You should consider upgrading via the 'C:\ProgramData\Anaconda3\python.exe -m pip install --upgrade pip' command.
In [3]:
# install requests to gain access to an URL
!pip install requests
Requirement already satisfied: requests in c:\users\sidharth\appdata\roaming\python\python38\site-packages (2.25.1)
Requirement already satisfied: chardet<5,>=3.0.2 in c:\programdata\anaconda3\lib\site-packages (from requests) (3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in c:\programdata\anaconda3\lib\site-packages (from requests) (2020.6.20)
Requirement already satisfied: idna<3,>=2.5 in c:\programdata\anaconda3\lib\site-packages (from requests) (2.10)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\programdata\anaconda3\lib\site-packages (from requests) (1.25.9)
WARNING: Ignoring invalid distribution -illow (c:\users\sidharth\appdata\roaming\python\python38\site-packages)
WARNING: Ignoring invalid distribution -equests (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -cikit-learn (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -cikit-image (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -illow (c:\users\sidharth\appdata\roaming\python\python38\site-packages)
WARNING: Ignoring invalid distribution -equests (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -cikit-learn (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -cikit-image (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -illow (c:\users\sidharth\appdata\roaming\python\python38\site-packages)
WARNING: Ignoring invalid distribution -equests (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -cikit-learn (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -cikit-image (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -illow (c:\users\sidharth\appdata\roaming\python\python38\site-packages)
WARNING: Ignoring invalid distribution -equests (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -cikit-learn (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -cikit-image (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -illow (c:\users\sidharth\appdata\roaming\python\python38\site-packages)
WARNING: Ignoring invalid distribution -illow (c:\users\sidharth\appdata\roaming\python\python38\site-packages)
WARNING: You are using pip version 21.1.1; however, version 21.1.3 is available.
You should consider upgrading via the 'C:\ProgramData\Anaconda3\python.exe -m pip install --upgrade pip' command.
In [4]:
# install kmeans for clustering
!pip install kmeans
Collecting kmeans
WARNING: Ignoring invalid distribution -illow (c:\users\sidharth\appdata\roaming\python\python38\site-packages)
WARNING: Ignoring invalid distribution -equests (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -cikit-learn (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -cikit-image (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -illow (c:\users\sidharth\appdata\roaming\python\python38\site-packages)
WARNING: Ignoring invalid distribution -equests (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -cikit-learn (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -cikit-image (c:\programdata\anaconda3\lib\site-packages)
  ERROR: Command errored out with exit status 1:
   command: 'C:\ProgramData\Anaconda3\python.exe' -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\SIDHARTH\\AppData\\Local\\Temp\\pip-install-5cqazg7o\\kmeans_436ee2d7924f4ac79da06f8530d58bc6\\setup.py'"'"'; __file__='"'"'C:\\Users\\SIDHARTH\\AppData\\Local\\Temp\\pip-install-5cqazg7o\\kmeans_436ee2d7924f4ac79da06f8530d58bc6\\setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d 'C:\Users\SIDHARTH\AppData\Local\Temp\pip-wheel-og2o_jr5'
       cwd: C:\Users\SIDHARTH\AppData\Local\Temp\pip-install-5cqazg7o\kmeans_436ee2d7924f4ac79da06f8530d58bc6\
  Complete output (19 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build\lib.win-amd64-3.8
  creating build\lib.win-amd64-3.8\kmeans
  Downloading kmeans-1.0.2.tar.gz (5.9 kB)
Building wheels for collected packages: kmeans
  Building wheel for kmeans (setup.py): started
  Building wheel for kmeans (setup.py): finished with status 'error'
  Running setup.py clean for kmeans
Failed to build kmeans
Installing collected packages: kmeans
    Running setup.py install for kmeans: started
    Running setup.py install for kmeans: finished with status 'error'
  copying kmeans\performance.py -> build\lib.win-amd64-3.8\kmeans
  copying kmeans\tests.py -> build\lib.win-amd64-3.8\kmeans
  copying kmeans\__init__.py -> build\lib.win-amd64-3.8\kmeans
  running egg_info
  writing kmeans.egg-info\PKG-INFO
  writing dependency_links to kmeans.egg-info\dependency_links.txt
  writing top-level names to kmeans.egg-info\top_level.txt
  reading manifest file 'kmeans.egg-info\SOURCES.txt'
  writing manifest file 'kmeans.egg-info\SOURCES.txt'
  copying kmeans\lib.c -> build\lib.win-amd64-3.8\kmeans
  running build_ext
  building 'kmeans/lib' extension
  error: Microsoft Visual C++ 14.0 is required. Get it with "Build Tools for Visual Studio": https://visualstudio.microsoft.com/downloads/
  ----------------------------------------
  ERROR: Failed building wheel for kmeans
WARNING: Ignoring invalid distribution -illow (c:\users\sidharth\appdata\roaming\python\python38\site-packages)
WARNING: Ignoring invalid distribution -equests (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -cikit-learn (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -cikit-image (c:\programdata\anaconda3\lib\site-packages)
    ERROR: Command errored out with exit status 1:
     command: 'C:\ProgramData\Anaconda3\python.exe' -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\SIDHARTH\\AppData\\Local\\Temp\\pip-install-5cqazg7o\\kmeans_436ee2d7924f4ac79da06f8530d58bc6\\setup.py'"'"'; __file__='"'"'C:\\Users\\SIDHARTH\\AppData\\Local\\Temp\\pip-install-5cqazg7o\\kmeans_436ee2d7924f4ac79da06f8530d58bc6\\setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'C:\Users\SIDHARTH\AppData\Local\Temp\pip-record-6s62dt6t\install-record.txt' --single-version-externally-managed --compile --install-headers 'C:\ProgramData\Anaconda3\Include\kmeans'
         cwd: C:\Users\SIDHARTH\AppData\Local\Temp\pip-install-5cqazg7o\kmeans_436ee2d7924f4ac79da06f8530d58bc6\
    Complete output (19 lines):
    running install
    running build
    running build_py
    creating build
    creating build\lib.win-amd64-3.8
    creating build\lib.win-amd64-3.8\kmeans
    copying kmeans\performance.py -> build\lib.win-amd64-3.8\kmeans
    copying kmeans\tests.py -> build\lib.win-amd64-3.8\kmeans
    copying kmeans\__init__.py -> build\lib.win-amd64-3.8\kmeans
    running egg_info
    writing kmeans.egg-info\PKG-INFO
    writing dependency_links to kmeans.egg-info\dependency_links.txt
    writing top-level names to kmeans.egg-info\top_level.txt
    reading manifest file 'kmeans.egg-info\SOURCES.txt'
    writing manifest file 'kmeans.egg-info\SOURCES.txt'
    copying kmeans\lib.c -> build\lib.win-amd64-3.8\kmeans
    running build_ext
    building 'kmeans/lib' extension
    error: Microsoft Visual C++ 14.0 is required. Get it with "Build Tools for Visual Studio": https://visualstudio.microsoft.com/downloads/
    ----------------------------------------
ERROR: Command errored out with exit status 1: 'C:\ProgramData\Anaconda3\python.exe' -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\SIDHARTH\\AppData\\Local\\Temp\\pip-install-5cqazg7o\\kmeans_436ee2d7924f4ac79da06f8530d58bc6\\setup.py'"'"'; __file__='"'"'C:\\Users\\SIDHARTH\\AppData\\Local\\Temp\\pip-install-5cqazg7o\\kmeans_436ee2d7924f4ac79da06f8530d58bc6\\setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'C:\Users\SIDHARTH\AppData\Local\Temp\pip-record-6s62dt6t\install-record.txt' --single-version-externally-managed --compile --install-headers 'C:\ProgramData\Anaconda3\Include\kmeans' Check the logs for full command output.
WARNING: Ignoring invalid distribution -illow (c:\users\sidharth\appdata\roaming\python\python38\site-packages)
WARNING: Ignoring invalid distribution -equests (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -cikit-learn (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -cikit-image (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -illow (c:\users\sidharth\appdata\roaming\python\python38\site-packages)
WARNING: Ignoring invalid distribution -illow (c:\users\sidharth\appdata\roaming\python\python38\site-packages)
WARNING: You are using pip version 21.1.1; however, version 21.1.3 is available.
You should consider upgrading via the 'C:\ProgramData\Anaconda3\python.exe -m pip install --upgrade pip' command.
In [5]:
# install folium for visualization
!pip install folium
Requirement already satisfied: folium in c:\programdata\anaconda3\lib\site-packages (0.12.1)
Requirement already satisfied: branca>=0.3.0 in c:\programdata\anaconda3\lib\site-packages (from folium) (0.4.2)
Requirement already satisfied: numpy in c:\programdata\anaconda3\lib\site-packages (from folium) (1.18.5)
Requirement already satisfied: requests in c:\users\sidharth\appdata\roaming\python\python38\site-packages (from folium) (2.25.1)
Requirement already satisfied: jinja2>=2.9 in c:\programdata\anaconda3\lib\site-packages (from folium) (2.11.2)
Requirement already satisfied: MarkupSafe>=0.23 in c:\programdata\anaconda3\lib\site-packages (from jinja2>=2.9->folium) (1.1.1)
Requirement already satisfied: certifi>=2017.4.17 in c:\programdata\anaconda3\lib\site-packages (from requests->folium) (2020.6.20)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\programdata\anaconda3\lib\site-packages (from requests->folium) (1.25.9)
Requirement already satisfied: idna<3,>=2.5 in c:\programdata\anaconda3\lib\site-packages (from requests->folium) (2.10)
Requirement already satisfied: chardet<5,>=3.0.2 in c:\programdata\anaconda3\lib\site-packages (from requests->folium) (3.0.4)
WARNING: Ignoring invalid distribution -illow (c:\users\sidharth\appdata\roaming\python\python38\site-packages)
WARNING: Ignoring invalid distribution -equests (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -cikit-learn (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -cikit-image (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -illow (c:\users\sidharth\appdata\roaming\python\python38\site-packages)
WARNING: Ignoring invalid distribution -equests (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -cikit-learn (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -cikit-image (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -illow (c:\users\sidharth\appdata\roaming\python\python38\site-packages)
WARNING: Ignoring invalid distribution -equests (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -cikit-learn (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -cikit-image (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -illow (c:\users\sidharth\appdata\roaming\python\python38\site-packages)
WARNING: Ignoring invalid distribution -equests (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -cikit-learn (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -cikit-image (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -illow (c:\users\sidharth\appdata\roaming\python\python38\site-packages)
WARNING: Ignoring invalid distribution -illow (c:\users\sidharth\appdata\roaming\python\python38\site-packages)
WARNING: You are using pip version 21.1.1; however, version 21.1.3 is available.
You should consider upgrading via the 'C:\ProgramData\Anaconda3\python.exe -m pip install --upgrade pip' command.
In [6]:
# import all necessary libraries
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

# !conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

from bs4 import BeautifulSoup
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt 

# import k-means from clustering stage
from sklearn.cluster import KMeans

# !conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')
Libraries imported.

3. Scrape Neighborhoods Data

As the dataset is not available,we will create a dataset of all neighborhoods of Toronto by webscraping.

In [7]:
# Get the neighborhood data using beautiful soup 
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
result = requests.get(url)
data_html = BeautifulSoup(result.content)

# read the data into a Pandas Dataframe
soup = BeautifulSoup(str(data_html))
In [8]:
# loop through table, grab each of the 3 columns shown
# Scrape the neighborhood data from the table in the wikipedia page of Toronto
table_contents=[]
table=soup.find('table')
for row in table.findAll('td'):
    cell = {}
    if row.span.text=='Not assigned':
        pass
    else:
      # Create three columns named as "PostalCode","Borough" & "Neighborhood"
        cell['PostalCode'] = row.p.text[:3] # store only first three letter from the test of <p> tab.(Ex: M3A )
        cell['Borough'] = (row.span.text).split('(')[0]
        cell['Neighborhood'] = (((((row.span.text).split('(')[1]).strip(')')).replace(' /',',')).replace(')',' ')).strip(' ')
        # here we replace some symbols like "(" , ")" , "/" from the neighborhood name(Ex: (Parkview Hill / Woodbine Gardens))
        table_contents.append(cell)

df=pd.DataFrame(table_contents)
# compress some big borough name by smaller one
df['Borough']=df['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})
df.head()
Out[8]:
PostalCode Borough Neighborhood
0 M3A North York Parkwoods
1 M4A North York Victoria Village
2 M5A Downtown Toronto Regent Park, Harbourfront
3 M6A North York Lawrence Manor, Lawrence Heights
4 M7A Queen's Park Ontario Provincial Government

This is the created dataset that we'r going to use. This dataset have 3 columns i.e "PostalCode", "Borough", "Neighborhood". As the dataset is unstructured and dirty we need some data pre-processing to clean the dataset.

In [9]:
# save this dataframe in a CSV file
df.to_csv('Neighborhood Data.csv')

4. Data Pre-processing

In this step we'll do these following steps

  • Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
  • More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.
  • If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.
  • Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.
  • In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.
In [10]:
# drop rows having null value and value assigned as "Not assigned"
df_dropna = df.dropna()
empty = 'Not assigned'
df_dropna = df_dropna[(df_dropna.PostalCode != empty ) & (df_dropna.Borough != empty) & (df_dropna.Neighborhood != empty)].reset_index(drop=True)
In [11]:
# check for missing value
df_dropna.isnull().sum()
Out[11]:
PostalCode      0
Borough         0
Neighborhood    0
dtype: int64
In [12]:
# Check if we still have any Neighborhoods that are Not Assigned
df_dropna.loc[df_dropna['Borough'].isin(["Not assigned"])]
Out[12]:
PostalCode Borough Neighborhood
In [13]:
df = df_dropna
df.head()
Out[13]:
PostalCode Borough Neighborhood
0 M3A North York Parkwoods
1 M4A North York Victoria Village
2 M5A Downtown Toronto Regent Park, Harbourfront
3 M6A North York Lawrence Manor, Lawrence Heights
4 M7A Queen's Park Ontario Provincial Government
In [14]:
# shape of dataframe
df.shape
Out[14]:
(103, 3)

Now data is cleaned and all the requirements are met. So we just have to add the Latitude and Longitudes of each location.

Now that you have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood. Now we are going to create a new table with the Latitudes and Longitudes corresponding to the different PostalCodes

In [15]:
# get the latitude and the longitude coordinates of each Postal code
geo_url = "https://cocl.us/Geospatial_data"

geo_df = pd.read_csv(geo_url)
geo_df.rename(columns={'Postal Code': 'PostalCode'}, inplace=True)
geo_df.head()
Out[15]:
PostalCode Latitude Longitude
0 M1B 43.806686 -79.194353
1 M1C 43.784535 -79.160497
2 M1E 43.763573 -79.188711
3 M1G 43.770992 -79.216917
4 M1H 43.773136 -79.239476

Now we'll merge the geographical dataframe with neighborhood dataframe according to the Postal Code

In [16]:
# Merging the Data
df = pd.merge(df, geo_df, on='PostalCode')
df.head()
Out[16]:
PostalCode Borough Neighborhood Latitude Longitude
0 M3A North York Parkwoods 43.753259 -79.329656
1 M4A North York Victoria Village 43.725882 -79.315572
2 M5A Downtown Toronto Regent Park, Harbourfront 43.654260 -79.360636
3 M6A North York Lawrence Manor, Lawrence Heights 43.718518 -79.464763
4 M7A Queen's Park Ontario Provincial Government 43.662301 -79.389494
In [17]:
# lets find out how many neighborhoods present in each borough
df.groupby('Borough').count()['Neighborhood']
Out[17]:
Borough
Central Toronto            9
Downtown Toronto          17
Downtown Toronto Stn A     1
East Toronto               4
East Toronto Business      1
East York                  4
East York/East Toronto     1
Etobicoke                 11
Etobicoke Northwest        1
Mississauga                1
North York                24
Queen's Park               1
Scarborough               17
West Toronto               6
York                       5
Name: Neighborhood, dtype: int64

4.1. Now we will visualize all the borough present in Toronto

In [18]:
df_toronto = df
df_toronto.head()
Out[18]:
PostalCode Borough Neighborhood Latitude Longitude
0 M3A North York Parkwoods 43.753259 -79.329656
1 M4A North York Victoria Village 43.725882 -79.315572
2 M5A Downtown Toronto Regent Park, Harbourfront 43.654260 -79.360636
3 M6A North York Lawrence Manor, Lawrence Heights 43.718518 -79.464763
4 M7A Queen's Park Ontario Provincial Government 43.662301 -79.389494
In [19]:
# Create a list and store all unique borough names
boroughs = df_toronto['Borough'].unique().tolist()
In [20]:
# Obtain the Latitude and Longitude of Toronto by taking mean of Latitude/Longitude of all postal code
lat_toronto = df_toronto['Latitude'].mean()
lon_toronto = df_toronto['Longitude'].mean()
print('The geographical coordinates of Toronto are {}, {}'.format(lat_toronto, lon_toronto))
The geographical coordinates of Toronto are 43.70460773398059, -79.39715291165048
In [21]:
# This will color categorize each borough
borough_color = {}
for borough in boroughs:
    borough_color[borough]= '#%02X%02X%02X' % tuple(np.random.choice(range(256), size=3)) #Random color
In [22]:
map_toronto = folium.Map(location=[lat_toronto, lon_toronto], zoom_start=10.5)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_toronto['Latitude'], 
                                           df_toronto['Longitude'],
                                           df_toronto['Borough'], 
                                           df_toronto['Neighborhood']):
    label_text = borough + ' - ' + neighborhood
    label = folium.Popup(label_text)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=borough_color[borough],
        fill_color=borough_color[borough],
        fill_opacity=0.8).add_to(map_toronto)  
    
map_toronto
Out[22]:
Make this Notebook Trusted to load map: File -> Trust Notebook

4.2. Next we will define foursquare Credentials

In [23]:
CLIENT_ID = 'CURLH5YYCXMLJUABNE5Y22LK1JNKWHZLO5MCW2OD4PRRRDK1' # your Foursquare ID
CLIENT_SECRET = 'O5PCL405KIK4MGGBIMJD2EIAYSEIQK03W4QMEG4L4ZYOEMMF' # your Foursquare Secret
VERSION = 20200514 # Foursquare API version

print('Credentials Stored')
Credentials Stored

4.3. Now, let's get the top 100 venues that are in each neighborhood within a radius of 500 meters.

First, let's create the GET request URL

In [24]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    LIMIT = 100 # limit of number of venues returned by Foursquare API
    radius = 500 # define radius
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
        
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)
In [25]:
#Get venues for all neighborhoods in our dataset
toronto_venues = getNearbyVenues(names=df_toronto['Neighborhood'],
                                latitudes=df_toronto['Latitude'],
                                longitudes=df_toronto['Longitude'])
Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Ontario Provincial Government
Islington Avenue
Malvern, Rouge
Don Mills North
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills South
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
The Danforth  East
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmount Park
Bayview Village
Downsview East
The Danforth West, Riverdale
Toronto Dominion Centre, Design Exchange
Brockton, Parkdale Village, Exhibition Place
Golden Mile, Clairlea, Oakridge
York Mills, Silver Hills
Downsview West
India Bazaar, The Beaches West
Commerce Court, Victoria Hotel
North Park, Maple Leaf Park, Upwood Park
Humber Summit
Cliffside, Cliffcrest, Scarborough Village West
Willowdale, Newtonbrook
Downsview Central
Studio District
Bedford Park, Lawrence Manor East
Del Ray, Mount Dennis, Keelsdale and Silverthorn
Humberlea, Emery
Birch Cliff, Cliffside West
Willowdale South
Downsview Northwest
Lawrence Park
Roselawn
Runnymede, The Junction North
Weston
Dorset Park, Wexford Heights, Scarborough Town Centre
York Mills West
Davisville North
Forest Hill North & West
High Park, The Junction South
Westmount
Wexford, Maryvale
Willowdale West
North Toronto West
The Annex, North Midtown, Yorkville
Parkdale, Roncesvalles
Enclave of L4W
Kingsview Village, St. Phillips, Martin Grove Gardens, Richview Gardens
Agincourt
Davisville
University of Toronto, Harbord
Runnymede, Swansea
Clarks Corners, Tam O'Shanter, Sullivan
Moore Park, Summerhill East
Kensington Market, Chinatown, Grange Park
Milliken, Agincourt North, Steeles East, L'Amoreaux East
Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
New Toronto, Mimico South, Humber Bay Shores
South Steeles, Silverstone, Humbergate, Jamestown, Mount Olive, Beaumond Heights, Thistletown, Albion Gardens
Steeles West, L'Amoreaux West
Rosedale
Enclave of M5E
Alderwood, Long Branch
Clairville, Humberwood, Woodbine Downs, West Humber, Kipling Heights, Rexdale, Elms, Tandridge, Old Rexdale
Upper Rouge
St. James Town, Cabbagetown
First Canadian Place, Underground city
The Kingsway, Montgomery Road, Old Mill North
Church and Wellesley
Enclave of M4L
Old Mill South, King's Mill Park, Sunnylea, Humber Bay, Mimico NE, The Queensway East, Royal York South East, Kingsway Park South East
Mimico NW, The Queensway West, South of Bloor, Kingsway Park South West, Royal York South West
In [26]:
toronto_venues.tail()
Out[26]:
Neighborhood Neighborhood Latitude Neighborhood Longitude Venue Venue Latitude Venue Longitude Venue Category
2120 Mimico NW, The Queensway West, South of Bloor,... 43.628841 -79.520999 Jim & Maria's No Frills 43.631152 -79.518617 Grocery Store
2121 Mimico NW, The Queensway West, South of Bloor,... 43.628841 -79.520999 McDonald's 43.630007 -79.518041 Fast Food Restaurant
2122 Mimico NW, The Queensway West, South of Bloor,... 43.628841 -79.520999 Koala Tan Tanning Salon & Sunless Spa 43.631370 -79.519006 Tanning Salon
2123 Mimico NW, The Queensway West, South of Bloor,... 43.628841 -79.520999 Value Village 43.631269 -79.518238 Thrift / Vintage Store
2124 Mimico NW, The Queensway West, South of Bloor,... 43.628841 -79.520999 Kingsway Boxing Club 43.627254 -79.526684 Gym

Lets check how many venues are there per neighborhood

In [27]:
toronto_venues.groupby('Neighborhood').count()
Out[27]:
Neighborhood Latitude Neighborhood Longitude Venue Venue Latitude Venue Longitude Venue Category
Neighborhood
Agincourt 4 4 4 4 4 4
Alderwood, Long Branch 7 7 7 7 7 7
Bathurst Manor, Wilson Heights, Downsview North 21 21 21 21 21 21
Bayview Village 4 4 4 4 4 4
Bedford Park, Lawrence Manor East 25 25 25 25 25 25
Berczy Park 58 58 58 58 58 58
Birch Cliff, Cliffside West 4 4 4 4 4 4
Brockton, Parkdale Village, Exhibition Place 24 24 24 24 24 24
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport 16 16 16 16 16 16
Caledonia-Fairbanks 4 4 4 4 4 4
Cedarbrae 8 8 8 8 8 8
Central Bay Street 65 65 65 65 65 65
Christie 16 16 16 16 16 16
Church and Wellesley 79 79 79 79 79 79
Clairville, Humberwood, Woodbine Downs, West Humber, Kipling Heights, Rexdale, Elms, Tandridge, Old Rexdale 4 4 4 4 4 4
Clarks Corners, Tam O'Shanter, Sullivan 12 12 12 12 12 12
Cliffside, Cliffcrest, Scarborough Village West 4 4 4 4 4 4
Commerce Court, Victoria Hotel 100 100 100 100 100 100
Davisville 37 37 37 37 37 37
Davisville North 7 7 7 7 7 7
Del Ray, Mount Dennis, Keelsdale and Silverthorn 4 4 4 4 4 4
Don Mills North 5 5 5 5 5 5
Don Mills South 19 19 19 19 19 19
Dorset Park, Wexford Heights, Scarborough Town Centre 6 6 6 6 6 6
Downsview Central 3 3 3 3 3 3
Downsview East 3 3 3 3 3 3
Downsview Northwest 5 5 5 5 5 5
Downsview West 5 5 5 5 5 5
Dufferin, Dovercourt Village 14 14 14 14 14 14
Enclave of L4W 14 14 14 14 14 14
Enclave of M4L 18 18 18 18 18 18
Enclave of M5E 99 99 99 99 99 99
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood 7 7 7 7 7 7
Fairview, Henry Farm, Oriole 69 69 69 69 69 69
First Canadian Place, Underground city 100 100 100 100 100 100
Forest Hill North & West 4 4 4 4 4 4
Garden District, Ryerson 100 100 100 100 100 100
Glencairn 4 4 4 4 4 4
Golden Mile, Clairlea, Oakridge 9 9 9 9 9 9
Guildwood, Morningside, West Hill 9 9 9 9 9 9
Harbourfront East, Union Station, Toronto Islands 100 100 100 100 100 100
High Park, The Junction South 25 25 25 25 25 25
Hillcrest Village 5 5 5 5 5 5
Humber Summit 2 2 2 2 2 2
Humberlea, Emery 1 1 1 1 1 1
Humewood-Cedarvale 4 4 4 4 4 4
India Bazaar, The Beaches West 23 23 23 23 23 23
Kennedy Park, Ionview, East Birchmount Park 4 4 4 4 4 4
Kensington Market, Chinatown, Grange Park 66 66 66 66 66 66
Kingsview Village, St. Phillips, Martin Grove Gardens, Richview Gardens 4 4 4 4 4 4
Lawrence Manor, Lawrence Heights 15 15 15 15 15 15
Lawrence Park 3 3 3 3 3 3
Leaside 31 31 31 31 31 31
Little Portugal, Trinity 42 42 42 42 42 42
Malvern, Rouge 2 2 2 2 2 2
Milliken, Agincourt North, Steeles East, L'Amoreaux East 4 4 4 4 4 4
Mimico NW, The Queensway West, South of Bloor, Kingsway Park South West, Royal York South West 14 14 14 14 14 14
Moore Park, Summerhill East 2 2 2 2 2 2
New Toronto, Mimico South, Humber Bay Shores 13 13 13 13 13 13
North Park, Maple Leaf Park, Upwood Park 4 4 4 4 4 4
North Toronto West 20 20 20 20 20 20
Northwood Park, York University 6 6 6 6 6 6
Old Mill South, King's Mill Park, Sunnylea, Humber Bay, Mimico NE, The Queensway East, Royal York South East, Kingsway Park South East 2 2 2 2 2 2
Ontario Provincial Government 30 30 30 30 30 30
Parkdale, Roncesvalles 14 14 14 14 14 14
Parkview Hill, Woodbine Gardens 10 10 10 10 10 10
Parkwoods 4 4 4 4 4 4
Regent Park, Harbourfront 42 42 42 42 42 42
Richmond, Adelaide, King 98 98 98 98 98 98
Rosedale 4 4 4 4 4 4
Roselawn 3 3 3 3 3 3
Rouge Hill, Port Union, Highland Creek 2 2 2 2 2 2
Runnymede, Swansea 33 33 33 33 33 33
Runnymede, The Junction North 3 3 3 3 3 3
Scarborough Village 1 1 1 1 1 1
South Steeles, Silverstone, Humbergate, Jamestown, Mount Olive, Beaumond Heights, Thistletown, Albion Gardens 10 10 10 10 10 10
St. James Town 81 81 81 81 81 81
St. James Town, Cabbagetown 43 43 43 43 43 43
Steeles West, L'Amoreaux West 13 13 13 13 13 13
Studio District 36 36 36 36 36 36
Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park 13 13 13 13 13 13
The Annex, North Midtown, Yorkville 21 21 21 21 21 21
The Beaches 4 4 4 4 4 4
The Danforth East 2 2 2 2 2 2
The Danforth West, Riverdale 42 42 42 42 42 42
The Kingsway, Montgomery Road, Old Mill North 3 3 3 3 3 3
Thorncliffe Park 20 20 20 20 20 20
Toronto Dominion Centre, Design Exchange 100 100 100 100 100 100
University of Toronto, Harbord 33 33 33 33 33 33
Victoria Village 4 4 4 4 4 4
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale 2 2 2 2 2 2
Westmount 8 8 8 8 8 8
Wexford, Maryvale 6 6 6 6 6 6
Willowdale South 34 34 34 34 34 34
Willowdale West 5 5 5 5 5 5
Willowdale, Newtonbrook 2 2 2 2 2 2
Woburn 3 3 3 3 3 3
Woodbine Heights 6 6 6 6 6 6
York Mills West 2 2 2 2 2 2

4.4. How many unique venues are there in all neighborhood ?

In [28]:
print('There are {} uniques vanue categories.'.format(len(toronto_venues['Venue Category'].unique())))
There are 272 uniques vanue categories.
In [29]:
print("The Unique Venue Categories are", toronto_venues['Venue Category'].unique())
The Unique Venue Categories are ['Fast Food Restaurant' 'Park' 'Convenience Store' 'Food & Drink Shop'
 'Hockey Arena' 'Portuguese Restaurant' 'Coffee Shop'
 'Financial or Legal Service' 'Bakery' 'Distribution Center' 'Spa'
 'Restaurant' 'Historic Site' 'Farmers Market' 'Chocolate Shop'
 'Performing Arts Venue' 'Pub' 'Breakfast Spot' 'Dessert Shop'
 'Event Space' 'Café' 'Bank' 'Theater' 'French Restaurant' 'Beer Store'
 'Art Gallery' 'Mexican Restaurant' 'Electronics Store'
 'Gym / Fitness Center' 'Yoga Studio' 'Antique Shop' 'Shoe Store'
 'Boutique' 'Vietnamese Restaurant' 'Clothing Store'
 'Furniture / Home Store' 'Accessories Store' 'Miscellaneous Shop'
 'Gift Shop' 'Sushi Restaurant' 'Diner' 'Italian Restaurant'
 'Japanese Restaurant' 'Burrito Place' 'Sandwich Place' 'Nightclub'
 'Hobby Shop' 'Bar' 'Gym' 'College Auditorium' 'Fried Chicken Joint'
 'Smoothie Shop' 'College Cafeteria' 'Creperie' 'Beer Bar' 'Print Shop'
 'Caribbean Restaurant' 'Gastropub' 'Pharmacy' 'Pizza Place'
 'Intersection' 'Flea Market' 'Athletics & Sports' 'Comic Shop'
 'Middle Eastern Restaurant' 'Music Venue' 'Plaza'
 'Modern European Restaurant' 'Hotel' 'Steakhouse' 'Tea Room'
 'Cosmetics Shop' 'Burger Joint' 'Ramen Restaurant' 'Bookstore'
 'Ethiopian Restaurant' 'College Rec Center' 'Video Game Store'
 'Shopping Mall' 'Chinese Restaurant' 'New American Restaurant'
 'Bubble Tea Shop' 'Thai Restaurant' 'Seafood Restaurant' 'Cocktail Bar'
 'Office' 'Movie Theater' 'Wine Bar' 'Poutine Place' 'Hookah Bar'
 'Department Store' 'Lingerie Store' 'Falafel Restaurant' 'Moving Target'
 'Sporting Goods Shop' 'Dim Sum Restaurant' 'Supermarket'
 'Asian Restaurant' 'Discount Store' 'Bike Shop' 'Grocery Store'
 'Skating Rink' 'Curling Ice' 'BBQ Joint' 'American Restaurant'
 'Vegetarian / Vegan Restaurant' 'Jazz Club' 'German Restaurant'
 'Comfort Food Restaurant' 'Lounge' 'Bistro' 'Moroccan Restaurant'
 'Irish Pub' 'Belgian Restaurant' 'Tailor Shop' 'Fountain'
 'Salon / Barbershop' 'Field' 'Playground' 'Trail' 'Liquor Store'
 'Pet Store' 'Rental Car Location' 'Donut Shop' 'Medical Center'
 'Health Food Store' 'Neighborhood' 'Museum' 'Cheese Shop'
 'Basketball Stadium' 'Bagel Shop' 'Greek Restaurant' 'Beach'
 'Concert Hall' 'Juice Bar' 'Fish Market' 'Eastern European Restaurant'
 'Gourmet Shop' "Women's Store" 'Pool' 'Korean BBQ Restaurant'
 'Sports Bar' 'Fish & Chips Shop' 'Brewery' 'Art Museum' 'Salad Place'
 'Ice Cream Shop' 'Indian Restaurant' 'Deli / Bodega' 'Korean Restaurant'
 'Metro Station' 'Candy Store' 'Baby Store' 'Hakka Restaurant'
 'Gas Station' 'Golf Course' 'Mediterranean Restaurant' 'Dog Run'
 'Bridal Shop' 'Mobile Phone Shop' 'Warehouse Store' 'Opera House'
 'Speakeasy' 'Monument / Landmark' 'Gluten-free Restaurant'
 'Brazilian Restaurant' 'Colombian Restaurant' 'Food Court' 'Soup Place'
 'Cupcake Shop' 'Latin American Restaurant' 'Noodle House' 'Building'
 'Smoke Shop' 'Toy / Game Store' 'Supplement Shop' 'Luggage Store'
 'Jewelry Store' 'Bus Station' 'Baseball Field' 'Massage Studio' 'Lake'
 'IT Services' 'Train Station' 'History Museum' 'Aquarium'
 'Scenic Lookout' 'Baseball Stadium' 'Indie Movie Theater' 'Hotel Bar'
 'Gym Pool' 'Cuban Restaurant' 'Record Shop' 'Malay Restaurant'
 "Men's Store" 'Airport' 'Tibetan Restaurant' 'Fruit & Vegetable Store'
 'Frozen Yogurt Shop' 'General Travel' 'General Entertainment'
 'Climbing Gym' 'Stadium' 'Bus Line' 'Soccer Field' 'Board Shop'
 'Light Rail Station' 'Molecular Gastronomy Restaurant' 'Basketball Court'
 'Construction & Landscaping' 'Motel' 'Home Service' 'Food Truck'
 'Stationery Store' 'Gay Bar' 'Coworking Space' 'Butcher'
 'College Stadium' 'Arts & Crafts Store' 'Swim School' 'Garden'
 'Cajun / Creole Restaurant' 'Auto Garage' 'Flower Shop'
 'Indoor Play Area' 'College Gym' 'College Arts Building' 'School'
 'Lawyer' 'Organic Grocery' 'Gaming Cafe' 'Doner Restaurant'
 'Bed & Breakfast' 'Hospital' 'Filipino Restaurant' 'Skate Park'
 'Harbor / Marina' 'Airport Lounge' 'Airport Food Court'
 'Airport Terminal' 'Plane' 'Airport Service' 'Sculpture Garden'
 'Boat or Ferry' 'Video Store' 'Camera Store' 'Church' 'Hostel'
 'Optical Shop' 'Poke Place' 'Drugstore' 'Garden Center' 'Truck Stop'
 'Taiwanese Restaurant' 'Snack Place' 'Market' 'River' 'Dance Studio'
 'Theme Restaurant' 'Adult Boutique' 'Escape Room' 'Sake Bar' 'Strip Club'
 'Martial Arts School' 'Health & Beauty Service' 'Auto Workshop'
 'Recording Studio' 'Wings Joint' 'Social Club' 'Hardware Store'
 'Tanning Salon' 'Thrift / Vintage Store']

4.5. Are there any Italian Restaurants present in the venues?

In [30]:
"Italian Restaurant" in toronto_venues['Venue Category'].unique()
Out[30]:
True

5. Data Analysis

5.1. Now we will analyze each neighborhood

As the column "Venue Category" contain categorical value.So we need to convert it to numerical values by one hot encoding.

In [31]:
# one hot encoding
to_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
to_onehot['Neighborhoods'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [to_onehot.columns[-1]] + list(to_onehot.columns[:-1])
to_onehot = to_onehot[fixed_columns]

print("shape of dataset after one hot encoding is : ",to_onehot.shape)
to_onehot.head()
shape of dataset after one hot encoding is :  (2125, 273)
Out[31]:
Neighborhoods Accessories Store Adult Boutique Airport Airport Food Court Airport Lounge Airport Service Airport Terminal American Restaurant Antique Shop Aquarium Art Gallery Art Museum Arts & Crafts Store Asian Restaurant Athletics & Sports Auto Garage Auto Workshop BBQ Joint Baby Store Bagel Shop Bakery Bank Bar Baseball Field Baseball Stadium Basketball Court Basketball Stadium Beach Bed & Breakfast Beer Bar Beer Store Belgian Restaurant Bike Shop Bistro Board Shop Boat or Ferry Bookstore Boutique Brazilian Restaurant Breakfast Spot Brewery Bridal Shop Bubble Tea Shop Building Burger Joint Burrito Place Bus Line Bus Station Butcher Café Cajun / Creole Restaurant Camera Store Candy Store Caribbean Restaurant Cheese Shop Chinese Restaurant Chocolate Shop Church Climbing Gym Clothing Store Cocktail Bar Coffee Shop College Arts Building College Auditorium College Cafeteria College Gym College Rec Center College Stadium Colombian Restaurant Comfort Food Restaurant Comic Shop Concert Hall Construction & Landscaping Convenience Store Cosmetics Shop Coworking Space Creperie Cuban Restaurant Cupcake Shop Curling Ice Dance Studio Deli / Bodega Department Store Dessert Shop Dim Sum Restaurant Diner Discount Store Distribution Center Dog Run Doner Restaurant Donut Shop Drugstore Eastern European Restaurant Electronics Store Escape Room Ethiopian Restaurant Event Space Falafel Restaurant Farmers Market Fast Food Restaurant Field Filipino Restaurant Financial or Legal Service Fish & Chips Shop Fish Market Flea Market Flower Shop Food & Drink Shop Food Court Food Truck Fountain French Restaurant Fried Chicken Joint Frozen Yogurt Shop Fruit & Vegetable Store Furniture / Home Store Gaming Cafe Garden Garden Center Gas Station Gastropub Gay Bar General Entertainment General Travel German Restaurant Gift Shop Gluten-free Restaurant Golf Course Gourmet Shop Greek Restaurant Grocery Store Gym Gym / Fitness Center Gym Pool Hakka Restaurant Harbor / Marina Hardware Store Health & Beauty Service Health Food Store Historic Site History Museum Hobby Shop Hockey Arena Home Service Hookah Bar Hospital Hostel Hotel Hotel Bar IT Services Ice Cream Shop Indian Restaurant Indie Movie Theater Indoor Play Area Intersection Irish Pub Italian Restaurant Japanese Restaurant Jazz Club Jewelry Store Juice Bar Korean BBQ Restaurant Korean Restaurant Lake Latin American Restaurant Lawyer Light Rail Station Lingerie Store Liquor Store Lounge Luggage Store Malay Restaurant Market Martial Arts School Massage Studio Medical Center Mediterranean Restaurant Men's Store Metro Station Mexican Restaurant Middle Eastern Restaurant Miscellaneous Shop Mobile Phone Shop Modern European Restaurant Molecular Gastronomy Restaurant Monument / Landmark Moroccan Restaurant Motel Movie Theater Moving Target Museum Music Venue Neighborhood New American Restaurant Nightclub Noodle House Office Opera House Optical Shop Organic Grocery Park Performing Arts Venue Pet Store Pharmacy Pizza Place Plane Playground Plaza Poke Place Pool Portuguese Restaurant Poutine Place Print Shop Pub Ramen Restaurant Record Shop Recording Studio Rental Car Location Restaurant River Sake Bar Salad Place Salon / Barbershop Sandwich Place Scenic Lookout School Sculpture Garden Seafood Restaurant Shoe Store Shopping Mall Skate Park Skating Rink Smoke Shop Smoothie Shop Snack Place Soccer Field Social Club Soup Place Spa Speakeasy Sporting Goods Shop Sports Bar Stadium Stationery Store Steakhouse Strip Club Supermarket Supplement Shop Sushi Restaurant Swim School Tailor Shop Taiwanese Restaurant Tanning Salon Tea Room Thai Restaurant Theater Theme Restaurant Thrift / Vintage Store Tibetan Restaurant Toy / Game Store Trail Train Station Truck Stop Vegetarian / Vegan Restaurant Video Game Store Video Store Vietnamese Restaurant Warehouse Store Wine Bar Wings Joint Women's Store Yoga Studio
0 Parkwoods 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 Parkwoods 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 Parkwoods 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 Parkwoods 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 Victoria Village 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [32]:
to_grouped = to_onehot.groupby(["Neighborhoods"]).mean().reset_index() 

print(to_grouped.shape)
to_grouped.head()
(99, 273)
Out[32]:
Neighborhoods Accessories Store Adult Boutique Airport Airport Food Court Airport Lounge Airport Service Airport Terminal American Restaurant Antique Shop Aquarium Art Gallery Art Museum Arts & Crafts Store Asian Restaurant Athletics & Sports Auto Garage Auto Workshop BBQ Joint Baby Store Bagel Shop Bakery Bank Bar Baseball Field Baseball Stadium Basketball Court Basketball Stadium Beach Bed & Breakfast Beer Bar Beer Store Belgian Restaurant Bike Shop Bistro Board Shop Boat or Ferry Bookstore Boutique Brazilian Restaurant Breakfast Spot Brewery Bridal Shop Bubble Tea Shop Building Burger Joint Burrito Place Bus Line Bus Station Butcher Café Cajun / Creole Restaurant Camera Store Candy Store Caribbean Restaurant Cheese Shop Chinese Restaurant Chocolate Shop Church Climbing Gym Clothing Store Cocktail Bar Coffee Shop College Arts Building College Auditorium College Cafeteria College Gym College Rec Center College Stadium Colombian Restaurant Comfort Food Restaurant Comic Shop Concert Hall Construction & Landscaping Convenience Store Cosmetics Shop Coworking Space Creperie Cuban Restaurant Cupcake Shop Curling Ice Dance Studio Deli / Bodega Department Store Dessert Shop Dim Sum Restaurant Diner Discount Store Distribution Center Dog Run Doner Restaurant Donut Shop Drugstore Eastern European Restaurant Electronics Store Escape Room Ethiopian Restaurant Event Space Falafel Restaurant Farmers Market Fast Food Restaurant Field Filipino Restaurant Financial or Legal Service Fish & Chips Shop Fish Market Flea Market Flower Shop Food & Drink Shop Food Court Food Truck Fountain French Restaurant Fried Chicken Joint Frozen Yogurt Shop Fruit & Vegetable Store Furniture / Home Store Gaming Cafe Garden Garden Center Gas Station Gastropub Gay Bar General Entertainment General Travel German Restaurant Gift Shop Gluten-free Restaurant Golf Course Gourmet Shop Greek Restaurant Grocery Store Gym Gym / Fitness Center Gym Pool Hakka Restaurant Harbor / Marina Hardware Store Health & Beauty Service Health Food Store Historic Site History Museum Hobby Shop Hockey Arena Home Service Hookah Bar Hospital Hostel Hotel Hotel Bar IT Services Ice Cream Shop Indian Restaurant Indie Movie Theater Indoor Play Area Intersection Irish Pub Italian Restaurant Japanese Restaurant Jazz Club Jewelry Store Juice Bar Korean BBQ Restaurant Korean Restaurant Lake Latin American Restaurant Lawyer Light Rail Station Lingerie Store Liquor Store Lounge Luggage Store Malay Restaurant Market Martial Arts School Massage Studio Medical Center Mediterranean Restaurant Men's Store Metro Station Mexican Restaurant Middle Eastern Restaurant Miscellaneous Shop Mobile Phone Shop Modern European Restaurant Molecular Gastronomy Restaurant Monument / Landmark Moroccan Restaurant Motel Movie Theater Moving Target Museum Music Venue Neighborhood New American Restaurant Nightclub Noodle House Office Opera House Optical Shop Organic Grocery Park Performing Arts Venue Pet Store Pharmacy Pizza Place Plane Playground Plaza Poke Place Pool Portuguese Restaurant Poutine Place Print Shop Pub Ramen Restaurant Record Shop Recording Studio Rental Car Location Restaurant River Sake Bar Salad Place Salon / Barbershop Sandwich Place Scenic Lookout School Sculpture Garden Seafood Restaurant Shoe Store Shopping Mall Skate Park Skating Rink Smoke Shop Smoothie Shop Snack Place Soccer Field Social Club Soup Place Spa Speakeasy Sporting Goods Shop Sports Bar Stadium Stationery Store Steakhouse Strip Club Supermarket Supplement Shop Sushi Restaurant Swim School Tailor Shop Taiwanese Restaurant Tanning Salon Tea Room Thai Restaurant Theater Theme Restaurant Thrift / Vintage Store Tibetan Restaurant Toy / Game Store Trail Train Station Truck Stop Vegetarian / Vegan Restaurant Video Game Store Video Store Vietnamese Restaurant Warehouse Store Wine Bar Wings Joint Women's Store Yoga Studio
0 Agincourt 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.25 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.00 0.0 0.0 0.0 0.0 0.0 0.00 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.0 0.0 0.000000 0.0 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.00 0.0 0.0 0.0 0.0 0.00 0.00 0.0 0.0 0.00 0.0 0.0 0.0 0.25 0.0 0.0 0.0 0.00 0.25 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.000000 0.000000 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.25 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.00 0.0 0.0 0.0 0.0 0.00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.0
1 Alderwood, Long Branch 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.00 0.0 0.0 0.0 0.0 0.0 0.00 0.0 0.0 0.0 0.0 0.0 0.142857 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.0 0.0 0.000000 0.0 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.000000 0.142857 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.00 0.0 0.0 0.0 0.0 0.00 0.00 0.0 0.0 0.00 0.0 0.0 0.0 0.00 0.0 0.0 0.0 0.00 0.00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.000000 0.285714 0.0 0.142857 0.0 0.0 0.0 0.0 0.0 0.0 0.142857 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.142857 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.00 0.0 0.0 0.0 0.0 0.00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.0
2 Bathurst Manor, Wilson Heights, Downsview North 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.095238 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.0 0.047619 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.00 0.0 0.0 0.0 0.0 0.0 0.00 0.0 0.0 0.0 0.0 0.0 0.095238 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.0 0.0 0.047619 0.0 0.0 0.0 0.047619 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.047619 0.0 0.0 0.0 0.0 0.0 0.0 0.047619 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.047619 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.047619 0.00 0.0 0.0 0.0 0.0 0.00 0.00 0.0 0.0 0.00 0.0 0.0 0.0 0.00 0.0 0.0 0.0 0.00 0.00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.047619 0.0 0.047619 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.047619 0.0 0.0 0.047619 0.047619 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.047619 0.0 0.0 0.0 0.0 0.047619 0.0 0.0 0.0 0.0 0.0 0.047619 0.0 0.00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.047619 0.0 0.047619 0.0 0.0 0.0 0.0 0.0 0.00 0.0 0.0 0.0 0.0 0.00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.0
3 Bayview Village 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.250000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.25 0.0 0.0 0.0 0.0 0.0 0.25 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.0 0.0 0.000000 0.0 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.00 0.0 0.0 0.0 0.0 0.00 0.25 0.0 0.0 0.00 0.0 0.0 0.0 0.00 0.0 0.0 0.0 0.00 0.00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.000000 0.000000 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.00 0.0 0.0 0.0 0.0 0.00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.0
4 Bedford Park, Lawrence Manor East 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.04 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.04 0.04 0.0 0.0 0.0 0.0 0.0 0.00 0.0 0.0 0.0 0.0 0.0 0.080000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.04 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.04 0.0 0.0 0.000000 0.0 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.04 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.04 0.040000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.04 0.0 0.0 0.0 0.0 0.08 0.00 0.0 0.0 0.04 0.0 0.0 0.0 0.00 0.0 0.0 0.0 0.04 0.00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.040000 0.040000 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.040000 0.0 0.0 0.0 0.0 0.040000 0.0 0.0 0.0 0.0 0.080000 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.040000 0.0 0.0 0.0 0.0 0.0 0.04 0.0 0.0 0.0 0.0 0.04 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.04 0.0

Here we only require the "Neighborhoods" and "Italian Restaurant" columns for the clustering. So we'll group these two columns.

In [33]:
ita = to_grouped[["Neighborhoods","Italian Restaurant"]]
ita.head()
Out[33]:
Neighborhoods Italian Restaurant
0 Agincourt 0.00
1 Alderwood, Long Branch 0.00
2 Bathurst Manor, Wilson Heights, Downsview North 0.00
3 Bayview Village 0.00
4 Bedford Park, Lawrence Manor East 0.08
In [34]:
# rename column "Neighborhoods" to "Neighborhood"
ita = ita.rename(columns={'Neighborhoods':'Neighborhood'})

6. Clustering

We will use k-means clustering. But first we will find the best K value using the Elbow Point method.

6.1. Elbow Method

In [35]:
# drop "Neighborhood" column from the dataframe
X = ita.drop(['Neighborhood'], axis=1)
In [36]:
# find 'k' value by Elbow Method
plt.figure(figsize=[10, 8])
inertia=[]
range_val=range(2,20)
for i in range_val:
  kmean=KMeans(n_clusters=i)
  kmean.fit_predict(X)
  inertia.append(kmean.inertia_)
plt.plot(range_val,inertia,'bx-')
plt.xlabel('Values of K') 
plt.ylabel('Inertia') 
plt.title('The Elbow Method using Inertia') 
plt.show()

Here,We saw that the optimum K value is 4 so we will have a resulting of 4 clusters.

In [37]:
kclusters = 4

toronto_grouped_clustering = ita.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]
Out[37]:
array([0, 0, 0, 0, 2, 3, 0, 1, 0, 0])
In [38]:
# unique value in target column
np.unique(kmeans.labels_)
Out[38]:
array([0, 1, 2, 3])

Now create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [39]:
to_merged = ita.copy()

# add clustering labels
to_merged["Cluster Labels"] = kmeans.labels_
In [40]:
to_merged.head()
Out[40]:
Neighborhood Italian Restaurant Cluster Labels
0 Agincourt 0.00 0
1 Alderwood, Long Branch 0.00 0
2 Bathurst Manor, Wilson Heights, Downsview North 0.00 0
3 Bayview Village 0.00 0
4 Bedford Park, Lawrence Manor East 0.08 2
In [41]:
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
to_merged = to_merged.join(toronto_venues.set_index("Neighborhood"), on="Neighborhood")

print(to_merged.shape)
to_merged.head()
(2125, 9)
Out[41]:
Neighborhood Italian Restaurant Cluster Labels Neighborhood Latitude Neighborhood Longitude Venue Venue Latitude Venue Longitude Venue Category
0 Agincourt 0.0 0 43.794200 -79.262029 Panagio's Breakfast & Lunch 43.792370 -79.260203 Breakfast Spot
0 Agincourt 0.0 0 43.794200 -79.262029 El Pulgarcito 43.792648 -79.259208 Latin American Restaurant
0 Agincourt 0.0 0 43.794200 -79.262029 Twilight 43.791999 -79.258584 Lounge
0 Agincourt 0.0 0 43.794200 -79.262029 Commander Arena 43.794867 -79.267989 Skating Rink
1 Alderwood, Long Branch 0.0 0 43.602414 -79.543484 Il Paesano Pizzeria & Restaurant 43.601280 -79.545028 Pizza Place
In [42]:
# sort the results by Cluster Labels
print(to_merged.shape)
to_merged.sort_values(["Cluster Labels"], inplace=True)
to_merged.tail()
(2125, 9)
Out[42]:
Neighborhood Italian Restaurant Cluster Labels Neighborhood Latitude Neighborhood Longitude Venue Venue Latitude Venue Longitude Venue Category
53 Little Portugal, Trinity 0.023810 3 43.647927 -79.419750 Bellwoods Brewery Bottle Shop 43.647120 -79.420044 Beer Store
53 Little Portugal, Trinity 0.023810 3 43.647927 -79.419750 Tiger of Sweden Toronto 43.645474 -79.419395 Men's Store
53 Little Portugal, Trinity 0.023810 3 43.647927 -79.419750 Montgomery's 43.644273 -79.418521 Restaurant
48 Kensington Market, Chinatown, Grange Park 0.015152 3 43.653206 -79.400049 Pancho's Bakery 43.654750 -79.402105 Bakery
40 Harbourfront East, Union Station, Toronto Islands 0.020000 3 43.640816 -79.381752 The Chartroom Bar & Lounge 43.640486 -79.376044 Hotel Bar

Lets check how many Italian Restaurant are there

In [43]:
to_merged['Venue Category'].value_counts()['Italian Restaurant']
Out[43]:
46

We see that there are a total of 46 locations with Italian Restaurants in Toronto
We will create a new dataframe with the Neighborhood and Italian Restaurants

6.2. Visualize Clustering on Google Map

In [44]:
# create map
map_clusters = folium.Map(location=[lat_toronto, lon_toronto], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(to_merged['Neighborhood Latitude'], to_merged['Neighborhood Longitude'], to_merged['Neighborhood'], to_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster))
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill_color=rainbow[cluster-1],
        fill_opacity=0.8).add_to(map_clusters)
       
map_clusters
Out[44]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Warning :

If we run the above cell,we can see the visualization on Google Map but when we'll upload this notebook on Github the Map visualization will not show. As Github doesn't support Google Map Visualization. So i've uploaded the Map visualization image on next cell from my drive.

Google Map Visualization

6.3. How many Neighborhoods per Cluster?

In [45]:
ita["Cluster Labels"] = kmeans.labels_
ita.head()
Out[45]:
Neighborhood Italian Restaurant Cluster Labels
0 Agincourt 0.00 0
1 Alderwood, Long Branch 0.00 0
2 Bathurst Manor, Wilson Heights, Downsview North 0.00 0
3 Bayview Village 0.00 0
4 Bedford Park, Lawrence Manor East 0.08 2
In [46]:
objects = (1,2,3,4)
y_pos = np.arange(len(objects))
performance = ita['Cluster Labels'].value_counts().to_frame().sort_index(ascending=True)
perf = performance['Cluster Labels'].tolist()
plt.bar(y_pos, perf, align='center', alpha=0.8, color=['red', 'purple','aquamarine', 'darkkhaki'])
plt.xticks(y_pos, objects)
plt.ylabel('No of Neighborhoods')
plt.xlabel('Cluster')
plt.title('How many Neighborhoods per Cluster')

plt.show()
In [47]:
# How many neighborhoods in each cluster

ita['Cluster Labels'].value_counts()
Out[47]:
0    73
3    11
1     9
2     6
Name: Cluster Labels, dtype: int64

6.4. Analysis of each Cluster

In [48]:
# This will create a dataframe with borough of each neighborhood which we will merge with each cluster dataframe

df_new = df[['Borough', 'Neighborhood']]
df_new.head()
Out[48]:
Borough Neighborhood
0 North York Parkwoods
1 North York Victoria Village
2 Downtown Toronto Regent Park, Harbourfront
3 North York Lawrence Manor, Lawrence Heights
4 Queen's Park Ontario Provincial Government

Cluster 1

In [49]:
# Red 

cluster1 = to_merged.loc[to_merged['Cluster Labels'] == 0]
df_cluster1 = pd.merge(df_new, cluster1, on='Neighborhood')
df_cluster1.head()
Out[49]:
Borough Neighborhood Italian Restaurant Cluster Labels Neighborhood Latitude Neighborhood Longitude Venue Venue Latitude Venue Longitude Venue Category
0 North York Parkwoods 0.0 0 43.753259 -79.329656 Brookbanks Park 43.751976 -79.332140 Park
1 North York Parkwoods 0.0 0 43.753259 -79.329656 KFC 43.754387 -79.333021 Fast Food Restaurant
2 North York Parkwoods 0.0 0 43.753259 -79.329656 649 Variety 43.754513 -79.331942 Convenience Store
3 North York Parkwoods 0.0 0 43.753259 -79.329656 Variety Store 43.751974 -79.333114 Food & Drink Shop
4 North York Victoria Village 0.0 0 43.725882 -79.315572 Cash Money 43.725486 -79.312665 Financial or Legal Service

Cluster 2

In [50]:
# Purple 
cluster2=to_merged.loc[to_merged['Cluster Labels'] == 1]
df_cluster2 = pd.merge(df_new, cluster2, on='Neighborhood')
df_cluster2.head()
Out[50]:
Borough Neighborhood Italian Restaurant Cluster Labels Neighborhood Latitude Neighborhood Longitude Venue Venue Latitude Venue Longitude Venue Category
0 North York Don Mills South 0.052632 1 43.7259 -79.340923 Sorento Restaurant 43.726575 -79.341989 Italian Restaurant
1 North York Don Mills South 0.052632 1 43.7259 -79.340923 Fitness Connection 43.727473 -79.341707 Gym
2 North York Don Mills South 0.052632 1 43.7259 -79.340923 Tilley Endurables 43.727033 -79.342926 Clothing Store
3 North York Don Mills South 0.052632 1 43.7259 -79.340923 The Beer Store 43.726987 -79.341494 Beer Store
4 North York Don Mills South 0.052632 1 43.7259 -79.340923 Swiss Chalet 43.726747 -79.341625 Restaurant

Cluster 3

In [51]:
# Blue
cluster3 = to_merged.loc[to_merged['Cluster Labels'] == 2]
df_cluster3 = pd.merge(df_new, cluster3, on='Neighborhood')
df_cluster3.head()
Out[51]:
Borough Neighborhood Italian Restaurant Cluster Labels Neighborhood Latitude Neighborhood Longitude Venue Venue Latitude Venue Longitude Venue Category
0 Downtown Toronto Christie 0.0625 2 43.669542 -79.422564 Fiesta Farms 43.668471 -79.420485 Grocery Store
1 Downtown Toronto Christie 0.0625 2 43.669542 -79.422564 Stubbe Chocolates 43.671566 -79.421289 Candy Store
2 Downtown Toronto Christie 0.0625 2 43.669542 -79.422564 Marlenes Just Babies 43.671824 -79.420499 Baby Store
3 Downtown Toronto Christie 0.0625 2 43.669542 -79.422564 Dupont Disco 43.670490 -79.426611 Nightclub
4 Downtown Toronto Christie 0.0625 2 43.669542 -79.422564 Marian Engel Park 43.673754 -79.423988 Park

Cluster 4

In [52]:
# Turquoise
cluster4 = to_merged.loc[to_merged['Cluster Labels'] == 3]
df_cluster4 = pd.merge(df_new, cluster4, on='Neighborhood')
df_cluster4.head()
Out[52]:
Borough Neighborhood Italian Restaurant Cluster Labels Neighborhood Latitude Neighborhood Longitude Venue Venue Latitude Venue Longitude Venue Category
0 Queen's Park Ontario Provincial Government 0.033333 3 43.662301 -79.389494 Coach House Restaurant 43.664991 -79.384814 Diner
1 Queen's Park Ontario Provincial Government 0.033333 3 43.662301 -79.389494 Tim Hortons 43.658906 -79.388696 Coffee Shop
2 Queen's Park Ontario Provincial Government 0.033333 3 43.662301 -79.389494 Hart House Gym 43.664172 -79.394888 Gym
3 Queen's Park Ontario Provincial Government 0.033333 3 43.662301 -79.389494 Convocation Hall 43.660828 -79.395245 College Auditorium
4 Queen's Park Ontario Provincial Government 0.033333 3 43.662301 -79.389494 Flock Rotisserie + Greens 43.659167 -79.389475 Fried Chicken Joint

6.4. Number of neighborhoods per cluster vs Average number of Italian Restaurants in each Cluster

In [53]:
plt.figure(figsize=(15,5))

# Plot-1 ( Number of Neighborhoods per Cluster )

plt.subplot(1,2,1)
objects = (1,2,3,4)
y_pos = np.arange(len(objects))
performance = ita['Cluster Labels'].value_counts().to_frame().sort_index(ascending=True)
perf_1 = performance['Cluster Labels'].tolist()
plt.bar(y_pos, perf_1, align='center', alpha=0.8, color=['red', 'purple','aquamarine', 'darkkhaki'])
plt.xticks(y_pos, objects)
plt.ylabel('No of Neighborhoods')
plt.xlabel('Cluster')
plt.title('Number of Neighborhoods per Cluster')

# Plot-2 ( Average number of Italian Restaurants per Cluster )

plt.subplot(1, 2, 2)
clusters_mean = [df_cluster1['Italian Restaurant'].mean(),df_cluster2['Italian Restaurant'].mean(),df_cluster3['Italian Restaurant'].mean(),
                df_cluster4['Italian Restaurant'].mean()]
y_pos = np.arange(len(objects))
perf_2 = clusters_mean
plt.bar(y_pos, perf_2, align='center', alpha=0.8, color=['red', 'purple','aquamarine', 'darkkhaki'])
plt.xticks(y_pos, objects)
plt.ylabel('Mean')
plt.xlabel('Cluster')
plt.title('Average number of Italian Restaurants per Cluster')
Out[53]:
Text(0.5, 1.0, 'Average number of Italian Restaurants per Cluster')

7. Conclusion

The Neighborhoods located in the East Toronto area(cluster-3) have the highest average of Italian Restaurants which is represented by aquamarine colour. North York has second heighest number of Italian restaurants present. Looking at the nearby venues, the optimum place to put a new Italian Restaurant is in Victoria village,North York(cluster-1) as their are many Neighborhoods in that area but a little number of Italian Restaurants therefore, eliminating any competition.The second best Neighborhoods that have a great oppurtunity would be in areas such as Queen's Park which is in Cluster 4.Having 70 neighborhoods in the area with no Italian Restaurants gives a good oppurtunity for opening up a new restaurant. This concludes the optimal findings for this project and recommends the entrepreneur to open an authentic Italian restaurant in these locations with little to no competition. Nonetheless, if the food is authentic, affordable and good taste, I am confident that it will have great following everywhere.

Here we take an Italian Restaurant as an example. We can do the same process to find the best place or neighborhood

  • to open a start up company
  • to stay on rent for bachelors
  • to start a side business for middle class people
  • to open a camp for any kind of servey
  • to release a new product for checking the success rate

8. Future Work

  • Apply different types of clustering algorithms to cluster the neighborhoods.
  • Consider other food vanues,market area etc. as features for clustering.
  • Consider more then 100 vanues in a neighborhood for analysis using Foursquare api.